AITopics | isolation kernel

Collaborating Authors

isolation kernel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mass Distribution versus Density Distribution in the Context of Clustering

Ting, Kai Ming, Zhu, Ye, Zhang, Hang, Liang, Tianrun

arXiv.org Machine LearningJan-26-2026

This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be easily changed to maximize the total density of all clusters in order to examine the fundamental limitation of using density distribution versus mass distribution. The key advantage of the MMC over the density-maximization clustering is that the maximization is conducted without a bias towards dense clusters.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2601.10759

Country:

North America > United States (0.46)
Asia (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

MIK: Modified Isolation Kernel for Biological Sequence Visualization, Classification, and Clustering

Ali, Sarwan, Chourasia, Prakash, Mansoor, Haris, koirala, Bipin, Patterson, Murray

arXiv.org Machine LearningOct-21-2024

The t-Distributed Stochastic Neighbor Embedding (t-SNE) has emerged as a popular dimensionality reduction technique for visualizing high-dimensional data. It computes pairwise similarities between data points by default using an RBF kernel and random initialization (in low-dimensional space), which successfully captures the overall structure but may struggle to preserve the local structure efficiently. This research proposes a novel approach called the Modified Isolation Kernel (MIK) as an alternative to the Gaussian kernel, which is built upon the concept of the Isolation Kernel. MIK uses adaptive density estimation to capture local structures more accurately and integrates robustness measures. It also assigns higher similarity values to nearby points and lower values to distant points. Comparative research using the normal Gaussian kernel, the isolation kernel, and several initialization techniques, including random, PCA, and random walk initializations, are used to assess the proposed approach (MIK). Additionally, we compare the computational efficiency of all $3$ kernels with $3$ different initialization methods. Our experimental results demonstrate several advantages of the proposed kernel (MIK) and initialization method selection. It exhibits improved preservation of the local and global structure and enables better visualization of clusters and subclusters in the embedded space. These findings contribute to advancing dimensionality reduction techniques and provide researchers and practitioners with an effective tool for data exploration, visualization, and analysis in various domains.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2410.15688

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.70)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel

Zhu, Ye | Ting, Kai Ming (Nanjing University)

Journal of Artificial Intelligence ResearchAug-2-2021

This paper presents a new insight into improving the performance of Stochastic Neighbour Embedding (t-SNE) by using Isolation kernel instead of Gaussian kernel. Isolation kernel outperforms Gaussian kernel in two aspects. First, the use of Isolation kernel in t-SNE overcomes the drawback of misrepresenting some structures in the data, which often occurs when Gaussian kernel is applied in t-SNE. This is because Gaussian kernel determines each local bandwidth based on one local point only, while Isolation kernel is derived directly from the data based on space partitioning. Second, the use of Isolation kernel yields a more efficient similarity computation because data-dependent Isolation kernel has only one parameter that needs to be tuned. In contrast, the use of data-independent Gaussian kernel increases the computational cost by determining n bandwidths for a dataset of n points. As the root cause of these deficiencies in t-SNE is Gaussian kernel, we show that simply replacing Gaussian kernel with Isolation kernel in t-SNE significantly improves the quality of the final visualisation output (without creating misrepresented structures) and removes one key obstacle that prevents t-SNE from processing large datasets. Moreover, Isolation kernel enables t-SNE to deal with large-scale datasets in less runtime without trading off accuracy, unlike existing methods in speeding up t-SNE.

gaussian kernel, isolation kernel, kernel, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12904

AI Access Foundation

12904

Journal of Artificial Intelligence Research

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Oceania > Australia > Victoria (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

Han, Xin, Zhu, Ye, Ting, Kai Ming, Li, Gang

arXiv.org Artificial IntelligenceOct-12-2020

Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. Existing AHC methods, which are based on a distance measure, have one key issue: it has difficulty in identifying adjacent clusters with varied densities, regardless of the cluster extraction methods applied on the resultant dendrogram. In this paper, we identify the root cause of this issue and show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it. We analyse the condition under which existing AHC methods fail to extract clusters effectively; and the reason why the data-dependent kernel is an effective remedy. This leads to a new approach to kernerlise existing hierarchical clustering algorithms such as existing traditional AHC algorithms, HDBSCAN, GDL and PHA. In each of these algorithms, our empirical evaluation shows that a recently introduced Isolation Kernel produces a higher quality or purer dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel.

algorithm, dendrogram, linkage function, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.patcog.2023.109517

2010.05473

Country:

Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Kansas (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Weekly Machine Learning Research Paper Reading List -- #9

#artificialintelligenceSep-29-2020, 18:25:05 GMT

This paper investigates data dependent kernels that are derived directly from data. This has been an outstanding issue for about two decades which hampered the development of kernel-based methods. We introduce Isolation Kernel which is solely dependent on data distribution, requiring neither class information nor explicit learning to be a classifier. In contrast, existing data dependent kernels rely heavily on class information and explicit learning to produce a classifier. We show that Isolation Kernel approximates well to a data independent kernel function called Laplacian kernel under uniform density distribution. With this revelation, Isolation Kernel can be viewed as a data dependent kernel that adapts a data independent kernel to the structure of a dataset.

artificial intelligence, isolation kernel, weekly machine learning research paper, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Clustering based on Point-Set Kernel

Ting, Kai Ming, Wells, Jonathan R., Zhu, Ye

arXiv.org Machine LearningFeb-13-2020

Measuring similarity between two objects is the core operation in existing cluster analyses in grouping similar objects into clusters. Cluster analyses have been applied to a number of applications, including image segmentation, social network analysis, and computational biology. This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a sample of objects generated from an unknown distribution. The proposed clustering procedure utilizes this new measure to characterize both the typical point of every cluster and the cluster grown from the typical point. We show that the new clustering procedure is both effective and efficient such that it can deal with large scale datasets. In contrast, existing clustering algorithms are either efficient or effective; and even efficient ones have difficulty dealing with large scale datasets without special hardware. We show that the proposed algorithm is more effective and runs orders of magnitude faster than the state-of-the-art density-peak clustering and scalable kernel k-means clustering when applying to datasets of millions of data points, on commonly used computing machines.

artificial intelligence, kernel k-means, machine learning, (17 more...)

arXiv.org Machine Learning

2002.05815

Country:

Oceania > Australia (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > Massachusetts > Middlesex County > Newton (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning

Ting, Kai Ming, Wells, Jonathan R., Washio, Takashi

arXiv.org Machine LearningJul-1-2019

Large scale online kernel learning aims to build an efficient and scalable kernel-based predictive model incrementally from a sequence of potentially infinite data points. To achieve this aim, the method must be able to deal with a potentially infinite number of support vectors. The current state-of-the-art is unable to deal with even a moderate number of support vectors. This paper identifies the root cause of the current methods, i.e., the type of kernel used which has a feature map of infinite dimensionality. With this revelation and together with our discovery that a recently introduced Isolation Kernel has a finite feature map, to achieve the above aim of large scale online kernel learning becomes extremely simple---simply use Isolation Kernel instead of kernels having infinite feature map. We show for the first time that online kernel learning is able to deal with a potentially infinite number of support vectors.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1907.01104

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.92)

Add feedback

Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

Qin, Xiaoyu, Ting, Kai Ming, Zhu, Ye, Lee, Vincent CS

arXiv.org Machine LearningJun-30-2019

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.

dataset, dbscan, isolation similarity, (14 more...)

arXiv.org Machine Learning

1907.00378

Country:

Oceania > Australia > Victoria (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Improving Stochastic Neighbour Embedding fundamentally with a well-defined data-dependent kernel

Zhu, Ye, Ting, Kai Ming

arXiv.org Artificial IntelligenceJun-24-2019

We identify a fundamental issue in the popular Stochastic Neighbour Embedding (SNE and t-SNE), i.e., the "learned" similarity of any two points in high-dimensional space is not defined and cannot be computed. It underlines two previously unexplored issues in the algorithm which have undermined the quality of its final visualisation output and its ability to process large datasets. The issues are: (a) the reference probability in high-dimensional space is set based on entropy which has undefined relation with local density; and (b) the use of data independent kernel which leads to the need to determine n bandwidths for a dataset of n points. This paper establishes a principle to set the reference probability via a data-dependent kernel which has a well-defined kernel characteristic that linked directly to local density. A solution based on a recent data-dependent kernel called Isolation Kernel addresses the fundamental issue as well as its two ensuing issues. As a result, it significantly improves the quality of the final visualisation output and removes one obstacle that prevents t-SNE from processing large datasets. The solution is extremely simple, i.e., simply replacing the existing data independent kernel with Isolation Kernel, leaving the rest of the t-SNE procedure unchanged.

artificial intelligence, kernel, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1906.09744

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback